# Mobsql

[![builds.sr.ht status](https://builds.sr.ht/~mil/mobroute.svg)](https://builds.sr.ht/~mil/mobroute?)

Mobsql is a Go library and commandline application
which facilitates loading one or multiple [Mobility
Database](https://github.com/MobilityData/mobility-database-catalogs)
source [GTFS](https://gtfs.org/schedule/) archives into a
SQLite database. Its internal SQLite schema mirrors GTFS's spec
but adds a source field to each table (thus allowing multiple
sources to be loaded to the database simulatenously). Mobsql's API
is designed to be as simple as possible: the user defines [a filter
specification](#strongsouceset-filter-functionalitystrong)
to determine which source(s) should be addressed, and from there can
download, compute, purge, and query the status(es) of the specified one
or more Mobility DB sources with a single command or library call.

While primarily developed to be used by
[Mobroute](http://git.sr.ht/~mil/mobroute), the minimal GTFS router and
related project, mobsql itself is a fully independent tool and can be
used as a standalone general-purpose GTFS-to-SQLite ETL & import utility
(either via CLI or as a Go library).

**Supported functionality:**
- Downloads & imports GTFS ZIP archives into a local SQLite database with
  an additional 'source' field in all GTFS tables correlating to [mdb_source_id](https://github.com/MobilityData/mobility-database-catalogs#gtfs-schedule-schema);
  thus allowing multiple sources to be stored without conflict.
- Supports bulk import (e.g. 1-insert-for-multiple rows) functionality to
  decrease load time.
- Supports a caching system, storing a checksum of the imported GTFS table(s)
  such that successive imports on the same data nops effectively if there
  would be no net change.
- In addition to downloading functionality, can be used to compute contrived
  data (similar to materialized views), purge, and query status information
  about sources.
- Features specification of a filter to determine
  a ['sourceset'](#strongsouceset-filter-functionalitystrong) of Mobility
  Database sources to import rather then making the user manually look up
  mdbids.
- Simply models the database schema including: GTFS specification import rules,
  Mobility Database catalog (CSV) imported as table, internal tracking tables, and SQLite views in
  a single file - [see schema datastructure](./dbschema/schema.go).
- Implements [conversion logic](./dbschema/converters.go#L11)
  building atop the [GTFS schema](./dbschema/schema.go) for fields which can
  be stored more efficiently (such as stop_times's departure_time/arrival_time
  as integers) rather then as colon time strings.
- Allows the creation of SQL views (currently used for internal tracking).
- Implements automatic index creation based on [schema specification](./dbschema/schema.go).
- Implements automatic creation of computed tables based on view logic (e.g.
  similar to the concept of materialized views).

**Planned functionality per roadmap** (but not-yet implemented):
- Implement loading of non-Mobility Database GTFS (e.g. local) ZIP archive
  sources.
- Implement cleanup logic to gracefully handle partial / interrupted loads
  (e.g. killing process during GTFS zip extraction & SQL import / operations).

## **Installation**
Mobsql functions as both a commandline and as a Go library.

- Installation for usage a CLI application:
  - Clone repo: `git clone https://git.sr.ht/~mil/mobsql`
  - Build executable: `go build -tags=sqlite_math_functions cli/mobsql.go`
  - Run via: `./mobsql`
  - *Optionally install to $PATH etc.*: `cp mobsql /usr/local/bin`
  - See [Commandline Documentation](#strongcommandline-documentationstrong) for further information on usage
- Installing as Go module to use in your Go project as a library:
  - Change directory to your go project's root directory: `cd foo`
  - Add via go get: `go get git.sr.ht/~mil/mobsql`
  - See [Go Module Documentation](#stronggo-module-documentationstrong) for information on Go library / module API


## **Souceset Filter Functionality**
Mobsql's commands (purge, load, compute, status) are built
around the idea of addressing a (subset) of the [Mobility Database
Catalog](https://github.com/MobilityData/mobility-database-catalogs) with
a single command. Rather then manually specifying the specific sources
you want to address by the identifier in the Mobility Database (mdbid)
in cases where you want to work with multiple sources, you instead define
a 'sourceset filter' which is translated to a 'sourceset' of mdbids.

The basic logic behind the sourceset filter concept is:

- `{country: FR, municipality: Paris}` (filter) -> `[1026 1069 1159 1283 1291 1314]`(mdbids)

By default (with a sourceset filter specified as `{}`); the entire
catalog is addressed (e.g. all ~1400 sources). As such, in almost all
cases you will want to specify particular properties on the filter to
more specifically address whichever sources you wish to perform the
requested command / operation on. Some examples might be that maybe you
want might want to load all GTFS sources for France (`{country: FR}`),
2 specific mdbids (`mdbid: [510, 515]`), or address only sources you've
already loaded previously in the database (`{loaded: true}`) so that
you can update these sources.

In the [CLI](#strongcommandline-documentationstrong) the sourceset
filter is expected to be passed as a YAML object (see below);
and meanwhile for the Go API, the sourceset filter is passed [as a
struct](https://pkg.go.dev/git.sr.ht/~mil/mobsql) for the Load, Compute,
Status, and Purge commands.

**YAML Sourceset Filter Specification:**

Filter is defined as a YAML object. Note all string properties use a
substring incasesensitive match policy.  The following properties are
valid and will be use in a combinatory fashion (e.g. properties translate
to AND conditions):

- `country`: References the country per `location.country_code` in [schema](https://github.com/MobilityData/mobility-database-catalogs#gtfs-schedule-schema) (string)
- `municipality`: References the municipality per `location.municipality` in [schema](https://github.com/MobilityData/mobility-database-catalogs#gtfs-schedule-schema) (string)
- `subdivision`: References the subdivision per `location.subdivision_name` in [schema](https://github.com/MobilityData/mobility-database-catalogs#gtfs-schedule-schema) (string)
- `mdbid`: References the unique mobility database source id per `mdb_source_id` in [schema](https://github.com/MobilityData/mobility-database-catalogs#gtfs-schedule-schema) ([]int)
- `provider`: References the unique mobility database provider `provider` in [schema](https://github.com/MobilityData/mobility-database-catalogs#gtfs-schedule-schema) (string)
- `name`: References the unique mobility database provider `name` in [schema](https://github.com/MobilityData/mobility-database-catalogs#gtfs-schedule-schema) (string)
- `glob`: Wildcard match {country,municipality,subdivision,provider,name} (sring)
- `coords`: Bounding box match ensuring contains coordinates ([][2]float)
- `maxkm`: Bounding box match ensuring bbox area is less then specification (float)

**YAML Sourceset Filter Examples**:
- `{mdbid: [510, 516]}`: Mdbid is 510 or 516
- `{country: BE}`: BE (Belgium) country matches
- `{municipality: paris}`: Paris municipality matches
- `{subdivision: ontario}`: Ontario subdivsion matches
- `{glob: foo}`: Matches for {country,municipality,subdivision,provider,name} containing subtring foo
- `{coords: [[40.512764, -74.251961]]}`: Matches where coordinate is within source bounding box
- `{maxkm: 20}`: Matches with bounding box maximum size of 20km (kilometers)
- `{loaded: true}`: Matches where sources are in DB/ previously downloaded (useful for updates, querying etc.)
- `{loaded: true, country: FR}`: Matches where source previously loaded & country matches FR
- `{}`: All sources / entire catalog
 
## **Commandline Documentation**
Commandline documentation is available via running `mobsql -h`; for reference
the generated documentation is available below:

```
Usage:
  ./mobsql [FLAGS]

Flags:
  -c string
       Command: should be one of {status,load,compute,purge}; note, you must pass -f when using -c.
       ---
       load: Load downloads the associated GTFS zip archives from the MobilityDB's
       URL; and then performs an import, pulling each GTFS Zip CSV file into
       the local SQLite database. Note that this operation is cached (e.g.
       subsequent requests with the same source will skip tables already imported
       if underlying GTFS CSV file checksums are unchanged).
    
       compute: Compute recomputes the application's computed tables (contrived from the
       source loaded GTFS tables) specified in application's configuration. Each
       computed table is based on a underlying view (see config). The concept of
       a computed table itself is similar to concept of a SQL materialized view;
       however 'recomputation' occurs on a per-source level. So you don't need
       to 'rematerialize' an entire table.
    
       status: Status queries the database for sources that match the given sourceset.
       This is a read-only operation which can be used to determine the effected
       sources for subsequent requests with the same sourceset.
    
       purge: Purge removes (deletes) all associated data from underlying GTFS tables
       in the SQLite database for the given sourceset.
    
  -db string
       Location of SQLite DB path; if unset uses ~/.cache/mobroute/sqlite.db
  -f string
       YAML filter specification to apply to -c command.
       Pass as a YAML object containing optional properties:
         {mdbid, country, municipality, subdivision, provider, name, glob, bbox, maxkm, loaded}
       Example YAML filter specifications:
         -f '{}'
         -f 'mdbid: [510, 516]'
         -f 'country: BE'
         -f 'municipality: paris'
         -f 'subdivision: ontario'
         -f 'glob: foo'
         -f 'coords: [[40.512764, -74.251961]]'
         -f 'maxkm: 20'
         -f 'loaded: true'
         -f '{glob: foo, country: US}'
    
  -h   Display help information
  -v   Display version information
  -vd
       Enable verbose debug messages
  -vi
       Enable verbose info messages (default true)
  -vvv
       Enable verbose info+warnings+debug messages (equivilant to -vi + -vw + -vd)
  -vw
       Enable verbose warnings messages
Examples:
  mobsql -f '{}' -c status (View all sources status)
  mobsql -f '{loaded: true}' -c status (View all loaded sources status)
  mobsql -f '{glob: nyc}' -c status (View all sources matching glob of nyc)
  mobsql -f '{country: FR}' -c load (Load all souces matching France country)
  mobsql -f '{country: FR}' -c purge (Purge database of all sources matching France country)
  mobsql -f '{}' -c purge (Purge database of all sources)
  mobsql -h (Display help text)
  mobsql -v (Display version)
```

## **Go Module Documentation**
Documentation for the Go module is available through pkg.go.dev. Note that the
major functionalities of the go package (Load, Compute, Purge, and Status) are
also available through the [commandline application](#strongcommandline-documentationstrong).

[https://pkg.go.dev/git.sr.ht/~mil/mobsql](https://pkg.go.dev/git.sr.ht/~mil/mobsql)
